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To Code or Not to Code Across Time: 
Space-Time Coding with Feedback 

Che Lin, Vasanthan Raghavan, and Venugopal V. Veeravalli* 



Abstract 



^ ' Space-time codes leverage the availability of multiple antennas to enhance the reliability of commu- 

o ; 

nication over wireless channels. While space-time codes have initially been designed with a focus on 



open-loop systems, recent technological advances have enabled the possibility of low-rate feedback from 
the receiver to the transmitter. The focus of this paper is on the implications of this feedback in a single- 
user multi-antenna system with a general model for spatial correlation. We assume a limited feedback 
model, that is, a coherent receiver and statistics along with B bits of quantized channel information at 
O , the transmitter. We study space-time coding with a family of linear dispersion (LD) codes that meet 

an additional orthogonality constraint so as to ensure low-complexity decoding. Our results show that, 
^ ' when the number of bits of feedback (B) is small, a space-time coding scheme that is equivalent to 

^ beamforming and does not code across time is optimal in a weak sense in that it maximizes the average 

, received SNR. As B increases, this weak optimality transitions to optimality in a strong sense which is 

characterized by the maximization of the average mutual information. Thus, from a system designer's 
1^ ' perspective, our work suggests that beamforming may not only be attractive from a low-complexity 

viewpoint, but also from an information-theoretic viewpoint. 
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I. Introduction 

Multipath fading, where the strength of the received signal fluctuates randomly, results in errors 
in signal propagation. These errors can be combated in practice by using multi-antenna diversity 
techniques at the transmitter and the receiver. The focus of this work is on the reliability aspect of 
multi-input multi-output (MIMO) systems under certain assumptions on communication models 
motivated by wireless systems in practice. In particular, we assume a block fading, narrowband 
model for the channel variation in time and frequency, and focus on realistic models for spatial 
correlation and channel state information (CSI) at the transmitter and the receiver. 

In this setting, the low-complexity beamforming scheme has attracted significant theoretical 
attention and has been studied with both perfect CSI at the transmitter [1], as well as with partial 
channel knowledge at the transmitter [2]-[4]. On the other hand, initial works on space-time 
codes assume no CSI at the transmitter and study the reliability of information transmission 
with uncoded inputs, that is, the input symbols are independent from one coherence block to 
another. Reliability can be improved by using certain delay diversity techniques [5] and these 
schemes can be extended to a more general framework, collectively known as space-time trellis 
codes [6]. Though space-time trellis codes are near-optimal in the MIMO setting, they suffer 
from decoding complexity that is exponential with the rate or the number of transmit antennas. To 
overcome these difficulties, orthogonal space-time block codes (OSTBC) [7], [8] that achieve the 
full diversity ordeil] of multi-antenna systems and offer the additional advantage of low decoding 
complexity have been proposed. 

Maximizing the diversity order, while being a useful design criterion, is only applicable for 
uncoded transmissions. In practice, a space-time code is used as an inner code in concatenation 
with an error correction code that is used as an outer code and is designed^ to achieve maximum 
possible rate for a given SNR. Clearly, in this setting, diversity order is not of great importance 
because the outer code can exploit the full diversity benefit of the channel by coding across 
different space-time coding blocks. Since the outer code helps the concatenated transmitter in 
approaching capacity, the mutual information is a meaningful design metric for the inner space- 
time block codes if soft decisions are allowed at the space-time decoder. In the no CSI case, it has 
been established that [10], [11] OSTBC are also optimal from a mutual information viewpoint. 



'The diversity order of a code is defined as tlie exponent of the rate at which error probabihty decays with SNR. 
^For example, recent works on low-density parity check codes (see, e.g., [9]) have shown that it is possible to construct outer 
codes that come close to achieving the mutual information between the input and the output of the inner space-time code. 
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The assumption of no CSI at the transmitter is too pessimistic and does not capture reality 
where either statistical or partial channel knowledge [12] at the transmitter is possible. In this 
context, when only the statistics are available at the transmitter, it is known that OSTBC are no 
longer optimal within the class of linear space-time codes [13]. In the more practical limited 
feedback case, there have been some recent works [14]-[18] on the design of space-time codes 
with CSI feedback. However, much of this body of work ignores spatial correlation and focuses 
on weighted OSTBC which result in a very restrictive set of linear operations on the inner 
space-time codes. As witnessed in the statistical case [13], spatial correlation can potentially 
lead to significant changes in code design criterion and optimal signaling. Thus, our goal in this 
work is to address optimal signaling with partial channel knowledge at the transmitter and more 
generally, whether coding across time is necessary in partial CSI systems. 

The general framework of linear dispersion (LD) codes, introduced in [19], subsumes all linear 
space-time codes and hence provides a natural framework for studying both beamforming as well 
as space-time code design in a unified way given that partial CSI is available at the transmitter. 
In an LD code, each symbol that is transmitted over the channel is some linear combination 
of the inputs and their complex conjugates and the codes are designed to maximize the mutual 
information between the input and the output of the space-time code. While the generality of 
the LD framework leads to some complications in code design, recent works [20]-[22] show 
that systematic LD code constructions are still possible. 

In this work, we impose an additional Generalized Orthogonal Constraint (GOC) [8], [13] 
on the LD codes so that they enjoy the same low-complexity of decoding|j as OSTBC. That is, 
we consider the set of orthogonal LD codes. The search for the optimal orthogonal LD codes 
provides significant insights to answer whether coding across time is necessary or not. 
Contributions: 

• We first show that when there is perfect CSI at the transmitter, the optimal power allocation 
across the different input symbols is uniform. Furthermore, the rank of the optimal LD code 
is one and since rank-one LD codes are equivalent to beamforming, the optimal perfect CSI 
scheme does not code across time. 

• When only statistical information is available at the transmitter, we establish that uniform 
symbol power allocation is still optimal. It is also known from [13] that the rank of the 



^Satisfaction of tfie GOC ensures that tlie joint maximum-likelihood (ML) decoding of the vector input reduces to individual 
ML decoding of the scalar inputs. 
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optimal linear space-time code is in general dependent on SNR and channel correlation. 
Furthermore, for any correlation, the rank is a non-decreasing function of SNR. Thus the 
optimal statistical scheme codes across time, in general. 

• In the partial CSI case, we first establish the optimality of uniform symbol power allocation, 
irrespective of the level of channel knowledge at the transmitter. On the question of spatial 
power allocation, it is natural to expect a smooth transition for the rank of the optimal 
scheme as the quality of channel information at the transmitter gets successively refined 
(that is, as the number of bits of feedback B increases). Surprisingly, we show that rank- 
one schemes enjoy strong optimality properties. When B is sufficiently large (for example, 
B ^ log(A^'() with Nt denoting the transmit antenna dimension), we show that rank-one 
schemes maximize the average mutual information, while in contrast when B is small, 
we show a slightly weaker result: Rank-one schemes maximize the average received SNR. 
While we expect a transition from strong optimality to weak optimality for small values of 
B (as a function of the SNR and the spatial correlation), numerical studies suggest that for 
most practical correlation, this SNR is too large from a practical standpoint. Thus, our results 
suggest that the optimal scheme under the orthogonal LD code framework and quantized 
feedback corresponds to not coding across time. 

• The optimality of rank-one schemes (beamforming) implies that the low-complexity advan- 
tage of scalar coding is justified from an information theoretic sense. 

Notations: We use X(i,_7) and X(i) to denote the i, j-th and i-th diagonal entries of a matrix 
X. The conjugate transpose and regular transpose are denoted by (•)^ and (•)^ while E[ ] and 
Tr(-) stand for the expectation and the trace operators, respectively. We say that a singular 
value decomposition of a matrix is in its standard ordering if its singular values are arranged in 
non-increasing order. Further, if the matrix is Hermitian, Amaxl ) denotes the largest eigenvalue. 

II. System Setup 

We consider a single-user MIMO communication system with Nt transmit and N^ receive 
antennas. The multi-antenna channel matrix experiences fading in time, frequency, and space. In 
this paper, we assume a narrowband, block fading model for the channel. That is, the channel is 
frequency flat and remains constant across a block of length N^. symbols and fades ergodically 
from block to block. With these simple models for the evolution of the channel across time and 
frequency, the main focus is on the spatial aspect. 
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To overcome the impediments of fading, we will consider the design of space-time codes 
and view the channel across the block length as corresponding to one channel use. The 
discrete-time, complex baseband model under this setting is given by 

Y = , f^HX + W 

where X G C'^^xn^ ^j^g transmitted signal matrix, Y G C^'^^^'= is the received signal matrix, 
H G C^'^^* corresponds to the channel matrix, and W G C^'^^'^ denotes the complex additive 
white Gaussian noise with i.i.d. entries, W(z,j) ~ CA/'(0, 1). We assume an average power 
constraint on X given by E [Tr (XX^^)] < NtN^ that results in a transmit power constraint p 
over each symbol duration. 

A. Spatial Correlation 

We now describe the spatial fading framework used in this work. It has been well-documented 
that the assumption of zero-mean Rayleigh fading is an accurate model for H in a non line-of- 
sight setting. Thus the complete channel statistics are described by the second-order moments. 
Rich scattering environments are accurately modeled by the commonly used i.i.d. model where 
the channel entries are i.i.d. CJ\f{0, 1). However, the i.i.d. model is not accurate in describing 
realistic propagation environments. Various statistical models have been proposed to overcome 
the deficiencies associated with the i.i.d. model. 

The most general, mathematically tractable spatial correlation model is based on a decom- 
position of the channel onto its canonical coordinates: the eigen-bases of the transmit and the 
receive covariance matrices [23]-[25]. The canonical model assumes that the auto- and the cross- 
correlation matrices on both the transmitter and the receiver sides have the same eigen-bases, 
and exploits this redundancy to decompose H as 

H = U,HindUl (1) 

where Hind has independent, but not necessarily identically distributed entries. Ur and Ut are 
eigenvector matrices corresponding to the receive and the transmit covariance matrices which 
are defined as = E[HH^"] and = E[H^H], respectively. It can be checked that [23] the 
model in ([U) reduces to some well-known models like the separable correlation model or the 
virtual representation framework [26]-[28]. 
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B. Channel State Information 

We will assume perfect CSI at the receiver in this work. However, perfect CSI at the transmitter 
is not feasible due to fast time variations of the channel that leads to a high cost associated with 
channel feedback/reverse-link trainingj. Nevertheless, we assume that the channel statistics, 
which change much more slowly than the channel realizations, can be learned reliably at both 
the ends. In practice, besides the statistical information, there is usually a viable low rate feedback 
link from the receiver to the transmitter. Thus recent attention, in both theory and practice, has 
shifted towards understanding the implications of partial CSI at the transmitter (most notably, in 
the form of limited or quantized feedback [12]) on the performance of communication systems. 
In this work, we assume an error- free, negligible-delay limited feedback link where B bits of 
channel information are conveyed per channel use. 

C. Signaling Scheme - Linear Dispersion Codes 

As mentioned in the introduction, the coding problem for the MIMO channel H can be 
separated into the design of an inner space-time block code and an outer code. Accordingly, 
input data x[t] is demultiplexed into K data-streams denoted by xi[t], • ■ • ,Xii-[t] for the space- 
time encoder at a given symbol time t. We make the following simplifying assumptions on the 
input symbols in this work. 
Assumption 1: 

• The data-streams corresponding to Xfc[t] are i.i.d. across time for all k and they are drawn 
from some real constellation with marginal distribution p{xk)- The mean of Xfc[t] is zero. 

• For any t and all i,j such that i 7^ j, yii[t] and Xj[t] are independent. 

The second assumption can be justified if xi[t], ■ ■ ■ , x/^[t] are produced as outputs of indepen- 
dent scalar outer encoders as in the V-BLAST signaling scheme. Applications involving the use 
of bit-interleaved codes at the outer encoder also justify the second assumption. Furthermore, 
both assumptions can be justified if the data coming from the encoder is fed through a random 
interleaver, a very practical assumption. Since Xfcft] are i.i.d. across time, we will drop the time 
index t in the ensuing discussion. 



''in case of Time-Division Duplexed (TDD) systems, the reciprocity of the forward and the reverse links can be exploited to 
train the channel on the reverse link. In case of Frequency-Division Duplexed (FDD) systems, the channel information acquired 
at the receiver has to be fed back. 
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While arbitrarily structured space-time coding schemes can be considered for signaling, in 
this work, we will focus on a specific LD code-based signaling [19]. The definition of an LD 
code involves a set of dispersion matrices {Ak} G C^*^^'= such that the space-time code X is 



where the symbols {x^}^^ satisfy Assumption 1. That is, at a given symbol time, the outer 
encoder produces a set of independent symbols {x^t} which is then spread across the spatial and 
temporal dimensions through {A^}. 

It is important to note that LD codes encompass all possible linear space-time codes. In 
addition, we assume that the class of LD codes satisfy the Generalized Orthogonal Constraint 
(GOC), that is, Aa.A] + AjA[ = for all k,j, k ^ j. It has been shown in [8], [10] that the GOC 
is equivalent to p(xi, . . . ,Xft:|Y, Hjnd) = riA^iPl^fcl^' Hjnd)- That is, the likelihood function 
factors and the complexity of the LD decoder is greatly reduced since the joint ML decoding 
reduces to individual ML decoding of each symbol. In other words, the channel decouples into 
K parallel sub-channels. It is important to note that the decoding complexity for this class of 
LD codes, labeled henceforth as orthogonal LD codes, is the same as that achieved by OSTBC. 

After normalizing x^ such that E [x^] = 1, the power constraint is applied to A^ resulting in 
J2k=i Tr(AfcA^) < NtNc. The power allocated to the A;-th symbol is Tr(AfcA^). 



In this section, we study the problem of optimal LD code construction under different as- 
sumptions on the available CSI at the transmitter. There are four relevant cases of CSI: two 
extreme cases of no/perfect CSI at the transmitter, a reasonable assumption where only statistical 
information is available, and a case where partial CSI in the form of quantized feedback is 
available at the transmitter. The first three cases are the subject of this section, and Sec. HVl deals 
with the last case in more detail. 

A. No CSI at the Transmitter 

When no channel information is available at the transmitter, the optimal scheme is to assume 
that the channel is i.i.d. Thus, any space-time code tailored to the i.i.d. case can be used. In 
particular, Hassibi and Hochwald [19] have applied the mutual information criterion to design 
optimal codes (within the class of LD codes) with i.i.d. Gaussian inputs. Jiang [10] has studied 
the design of optimal LD codes for i.i.d. channels with binary inputs and conjectured that 



K 




(2) 



k=l 



III. Optimal Signaling Schemes 
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the optimal code is the generalized orthogonal design introduced in [8]. Bresler and Hajek [11] 
proved the above conjecture and extended the work to arbitrary real inputs. The following sections 
demonstrate how channel information can help improve performance. 



B. Statistical Information at the Transmitter 

We now study the structure of the optimal LD codes when only statistical information is 
available at the transmitter. We build on the recent work in [13] where optimal LD codes are 
constructed by maximizing the average mutual information between the input and the output of 
the inner code. If channel correlation is modeled with the canonical framework as in ([T]), we 
obtain the following equivalent channel model: 



K 



^ = \WY1 Afcx, + W (3) 
V * k=i 

where = Uj A^, Y = UjY, and W = UjW. The GOC is equivalent to A^ A] + A^ A[ = 
for all k,j, k ^ j and the original power constraint is equivalent to XIaLi T^(Aa:A|,) < NfNc. 
We have the following theorem characterizing the structure of optimal LD codes. 

Theorem 1: Let X = UjX be an LD code as in Q with K symbols and let the corresponding 
dispersion matrices be {A^, k = 1, ■ ■ ■ ,K}. Also, let the input symbols xi,--- ,xx satisfy 
Assumption 1 and the dispersion matrices satisfy the GOC. If there exists an LD code satisfying 
the power constraint condition: A^A^ = Astat for all k where Astat is a positive semidefinite 
diagonal matrix with 



Astat = argmaxE 



^(|-iy(h;„,ahL)) 



s.t. Tr (A) 



NtN, 



K ' 

then such a code maximizes the average mutual information E [/(X; Y|H = H)] and achieves 
ergodic capacity. 

Proof: See Appendix |Al ■ 
Theorem m states that uniform symbol power allocation across the data-streams is optimal from 
an average mutual information viewpoint. The optimal spatial power allocation is given by Astat, 
and in general, Astat excites multiple modes non-uniformly. We now elaborate on the structure 
of {Afc}. Given that r denotes the number of spatial modes excited by the optimal statistical 
scheme, it is straightforward to check that Nc > r is necessary. This follows by assuming a 
generic singular value decomposition for A^ and checking that the power constraint condition 
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holds. Furthermore, it can be seen that 

] if > N, 

\ [v^]p,„Yt \ir<N,<N, 

where is an arbitrary x unitary matrix and [^/A^tat ] ^^.^ is the N-t x Nc principal sub- 
matrix of VAstat- With the above structure for and with Y^ denoting the Nc x r principal 
sub-matrix of Y^, we need 

Y^Y, + YjYfc = for all ^ j (4) 

to meet the GOC. If rK < N^, & can be met by letting {Y^} to be a set of r distinct columns 
of a random A^^^ x N^. unitary matrix. In fact, this choice leads to a stronger condition where 
yJy, = for any k j. Initial studies suggest that rK < 2Nc is both necessary and sufficient 
for a feasible construction that meets dH). These results and their connections to constructions 
via generalized orthogonal designs [8] will be reported elsewhere. 

C. Perfect CSI at the Transmitter 

When perfect CSI is available at both the ends, the system equation can be written as 

Y= ./|- 5^HA,Xfc + W (5) 

* * k=i 

with a power constraint Tr(AA;A|,) < NfNc. Following [10], [13], it can be shown that 

we have the following upper bound for the mutual information: 

K 

/fX; 



Y|H = H) = /(xi, . . . , x,^; Y|H = H) < ^ J Tx^; . /^H A^x^ + W |H = H V (6) 

h 

Equality in ^ holds if and only if the GOC is satisfied. From [10], [13], we also have 

h = ^(^|-Tr(HQ,Ht)^ -/^(n) (7) 

where = AfcA|,, {p{a) = h{^/a:>^ + n|H = H), h{-) denotes the differential entropy, and n is 
a real zero-mean Gaussian of variance 1/2. The structure of the optimal LD code is as follows. 

Theorem 2: Let X be an LD code as in ^ with K symbols and let the corresponding 
dispersion matrices be {A^, k = 1, ■ • ■ ,K}. Also, let the input symbols xi, ■ • • ,xk satisfy 
Assumption 1 . The instantaneous mutual information can be upper bounded as 



/fX; YIH = H) <K 



K 



■An,ax(HtH)^ -h{u) 



(8) 



10 



with equality if and only if {A^} satisfy the GOC and A^A^ = Qfc = Q for all k where 
Q = UhAhU[,. The matrix Uh is an eigenvector matrix of H^^H (in the standard order) and the 
only non-zero entry in Ah is the leading diagonal element whose value is 

Proof: See Appendix |Bl ■ 
The above result shows that the optimal choice of {Q^} is independent of k, that is, uniform 
symbol power allocation is still optimal. Furthermore, this scheme excites only the dominant 
spatial mode. A generic singular value decomposition for A^ shows that it has to satisfy A^ = 
Uh [1] Vfc where Uh [1] is A^^ x 1 and is the first column of Uh, and is an 1 x vector 



V 



K 

of unit norm. With this structure, it can also be checked that the GOC can be met if and only 
if the K X Nc matrix V defined as 

1 ^2 \ 

satisfies VV^ = + zX where X is real skew-symmetriclfl. Based on this decomposition, we 
can completely characterize the structure of the dispersion matrices {A^} and can establish the 
existence of an LD code if and only if < 2Nc. The following proposition states this result. 

Proposition 1: There exists a K x N^, matrix V such that VV^ = Ik + i'^ where X is real 
skew- symmetric if and only if < 2Nc. ■ 

Due to space constraints, the proof of the claim and the explicit construction of the dispersion 
matrices are not reported here. However, we provide simple illustrations of these constructions 
now. When K < Nc, any set of K rows of an arbitrary x Nc unitary matrix works for V. 
Since {v^} are pairwise orthogonal, the GOC is naturally met. In fact, it is to be noted that a 
stronger condition (than the GOC) holds: A^At = for all k ^ j. Under the same conditions as 
above, [8] proposes further explicit constructions to meet the GOC. However, these conditions 
are only sufficient, but not necessary as the statement of Prop. [T] illustrates. 

The surprising^ claim of Prop. [His that the GOC can be met and the data-streams separated 
temporally as long as, K < 2Nc. The two-fold gain in the maximum possible choice of K (which 
is 2Nc) over the 'naturally' expected limit of K = Nc stems primarily from the weaker condition 
that 2Re(AfcAp = A^Aj + AjA\ = for all k ^ j, instead of the more stringent condition 
that AfcA] = for all k ^ j. For example, when K = 2Nc, the condition in Prop. [U can be 

^An n X n matrix X is said to be real skew-symmetric if it has real entries and satisfies X"^ — —X. 

*In retrospect, this claim is not all that surprising since the use of real input symbols means that R bits can be transmitted 
per complex-dimension if a 2^-ary constellation is used for signaling. 
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satisfied by choosing 

■ ■ ■ ^ ■ ■ ■ 1 jf ^ jg o^^^ h = ^ 

(9) 



Vfc 



if k is even, ^2 



k 



ki 

■■■ ^ ■■■ 

We are now prepared to state the main result of this section. 

Theorem 3: Consider the family of orthogonal LD codes whose decoding complexity is 
comparable with OSTBC. The mutual information achievable with such codes is a non-decreasing 
function of K which implies that K = 2Nc is necessary for optimal signaling. Furthermore, the 
optimal signaling scheme reduces to beamforming along Uh[1]. More simply stated, not coding 
across time is optimal from an information theoretic perspective. 

Proof: First, note that the GOC has to be met for the case of orthogonal LD codes. From 
Prop. [H we see that K = 2Nc is the largest value of K such that this is possible. With K = 2Nc 
and Vfc as in ([9]), the average mutual information can be expressed as 

/(xi,...,x;^;Y|H) = Kip ( ^] - Kh{n) 



where Z = pA^cAmax(H^H). Letting K to be a continuous parameter in the previous expression, 
we observe that the derivative of the mutual information with respect to K is positive. For this, 
note that for any Z > 0, we have 

V> (I) - h{n) = ^ " ^'{y) > I ^' (I) (10) 

since ip(-) is a differentiable function with (p(0) — h(n) = and = |mse(a) and hence, 
monotonically decreasing in a; see App. |B]for details. In this setting, we have 



Y = -^^HUh[1]x + W = HUH[l]Xt,3ns + W, 
Xtrans = ^j^^^ X = [xi + ZX2 , X3 + ZX4 , ■■■ , X2Ar,-l + ZX2ArJ (11) 

for the system equation. In other words, the optimal signaling scheme reduces to beamforming 
the complex symbol (x2fc-i + ix2fc) / \/2 along the fixed transmit direction Uh[1] in the k-th 
symbol period of the coherence block with a transmit energy of p. The proof is complete. ■ 



IV. Quantized CSI at the Transmitter 

From the previous section, we see that rank-one signaling (beamforming) is optimal in the 
perfect CSI case while in the statistical case, the rank of the optimal scheme could be greater 
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than one, in general. It is natural to expect a smooth transition in the rank as the quality of 
CSI gets refined with increasing B. In this section, we show that a rank-one scheme has strong 
optimality properties. This observation is based on the following two results: 1) When B is 
sufficiently large (to be characterized more precisely soon), a rank-one scheme maximizes the 
average mutual information, 2) In the small B regime, we can show that a rank-one scheme 
maximizes the average received SNR. 

We first make precise the notion of a B-bit limited feedback scheme in the context of LD 
codes. We assume the knowledge of a codebook (of 2^ codewords) at the transmitter and the 
receiver where each codeword is a set of K dispersion matrices satisfying a total power constraint. 
That is, the codebook C is 

C=!^C' = (Al ■■■,A'^):J2 T^AiAf ) < iV,iV„ £ = 1, ■ ■ ■ , 2^| . (12) 

If the i-th codeword is used in signaling, the system model is described by 

I — ^ 



k=l 



Recall from the previous section that 



/(xi,...,x^;Y|H= H) < ^/(^Xfc;y^HA^,Xfc + W|H = H 

= E^(|^Tr(HQiHt)^-i^Mn) (14) 

where the upper bound is met if {Al} satisfies the GOC, Q^, = A^,Afe^^ and h{n) is defined 
as in ©. Thus, the mutual information is completely characterized by the set of covariance 
matrices (Qf , ■ ■ ■ , Q^). Over each coherence block, the receiver feeds back £*, the index of 
the optimal codeword which maximizes the instantaneous mutual information to the transmitter, 
and the transmitter communicates over the remaining symbols in the coherence block according 
to ^ with dispersion matrices {Af }. We now show that uniform symbol power allocation is 
optimal even in the partial CSI case. 

Proposition 2: Let the S-bit quantized feedback system be described as in (fT3l) . For any 
choice of B, the average mutual information is maximized by a codebook that allocates uniform 
power to input symbols. In fact, for any codeword index £, we have = for all k. 

Proof: See Appendix O ■ 

Thus, from the above theorem, we only need to quantize Q^. The most natural quantization 
for a covariance matrix is based on an eigen-decomposition of Q^. For this, we let A^^i and N2 
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be such that A^iA^2 = 2^, and quantize as 

q£ ^ Q{i~i)N,+j ^ UiA.Ul ^ Q*'^ ^ = 1, • • • , TVi and J = 1, • • • , (15) 

where {Uj} are unitary and {A^} are positive semi-definite diagonal with Tr(Aj) < NtNc/K. 
While the above quantization seems natural, it is unclear how to allocate B into A^i and N2 
optimally. In the ensuing discussion, we consider two cases and study the optimality of rank-one 
codebooks. 

A. Strong Optimality of Rank-One Codebooks when N2 > Nf 

Since the rank of Q*'^ is that of A^, we fix {Ui}i=i^... to be a known family and optimize 
over all possible {Aj}. This results in the following optimization: 

■ (16) 



Aj* = max E 



max(p ( — Tr(HUiA,UjH^) 



{A,:TV(A,)<iMk} 

We say that rank-one codebooks are strongly optimal if a codebook of rank-one power allocations 
is sufficient to maximize the average mutual information for any choice of {Uj}. 

Theorem 4: The main conclusion is that if N2> Nt, a rank-one codebook is strongly optimal. 
Proof: For each realization of the channel H H, we seek to maximize the instantaneous 
mutual information by choosing the optimal codeword at the receiver and feed back the index 
(i*, j*) to the transmitter through the feedback link. For this, note that for any fixed {Uj}, 

Nt 

2 



.jym)\\Sim\ 




^,J \ K ^ — i I \ K 

where (a) follows by defining Sj = A^'^^U[,Uj and H^H = UhAhU[,, (b) and (c) follow by 
denoting the m-th column of Sj by Sj^, its norm by Sim, ajm = ^n^^ '^i™ — 1' 

(d) follows by letting {i*,m*) = argmaxi<j<Arj_ i<m<Nt Sim- If ^2 > Nt, we can consider a 
distinct set of Nt rank-one power allocations each of which excites only one mode. Using this 
set in the above framework allows us to meet the upper bound. ■ 
Note that the condition N2 > Nt impUes that 2^ = N1N2 > Nf. But this inequality does 
not impose any constraint on A^i. Nevertheless, we can say that if B is sufficiently large (B ^ 
\og{Nt)) so that at least log(A^t) bits can be allocated to quantize the power allocation component 
of Q^, rank-one codebooks are always optimal irrespective of the constellation of input symbols, 
SNR, channel correlation etc. This optimality is not completely surprising since the quantized 
feedback system closely approximates a perfect feedback system when B » log(A^t). 
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B. Weak Optimality of Rank- One Codebooks when N2 < Nt 

From the notation of Theorem HI when N2 < Nt we can rewrite the optimization in (fT6l) as 



max 



E 



<1 for alii} 



pNc 
K 



Nt 

E 

m=l 



(17) 



Direct optimization of (flTI) requires the exact distribution function of Sim which is a complicated 
function of the spatial correlation, thus rendering the above problem intractable. We now consider 
an alternate formulation of the above problem wherein the objective function is the minimization 
of AI with 



AI 



E 



pN, 
K 



Amax(H^H 



max(y9 




Oijm^im 



That is, the objective is to minimize the difference in average mutual information between the 
perfect CSI benchmark and a quantized feedback scheme. We now propose an upper bound for 
AI that renders the study of optimal signaling tractable in a weak sense. 

Lemma 1: The quantity AI can be upper bounded by ASNR, the difference in average 



received SNR, defined as ASNR 4 . Eh 



Amax(H^H) 



E 



Nt 

in=l (-^jmSim 



Proof: See Appendix |Dl ■ 
Thus, in a weak sense, the optimization in (fTTI) is equivalent to maximizing the average received 
SNR of the quantized feedback scheme: 

' pN, 



max E 



max 



Nt 

E 

m=l 



(18) 



With this new metric, we now establish the optimality of rank-one codebooks. 

Theorem 5: Let the i?-bit quantized feedback system and the corresponding A^i and N2 be 
described as before. If N2 < Nt, the average SNR at the receiver is maximized by a rank-one 
codebook. 

Proof: See Appendix IB ■ 
It is important to note that the optimality of a rank-one codebook in terms of the new metric 
does not necessarily imply the optimality of rank-one codebooks in terms of the average mutual 
information. For any choice of correlation and B that is comparable to log(A''t), we expect a 
natural transition from strong optimality to weak optimality as SNR increases. Nevertheless, 
numerical studies suggest that this transition SNR is very large for most reasonable correlation, 
so that practically speaking rank-one codebooks are still optimal. This will be the focus of our 
future work. Furthermore, following the approach in Theorem [3l the system equation reduces to 

Y = HU,4/]xt,3ns + W (19) 
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where Xtrans is as in (fTTI) and for any given realization H = H, Q 
diag(ej*) and ej* is the j*-th standard basis vector of M^'. 



Ui*Aj*Uj. with 



A. 



K 



It is critical to note the difference between a beamforming scheme in the classical sense 
and the system equation in (fT9l) . In the classical sense, the beamforming direction is fixed and 
independent of the channel state, but perhaps dependent on the channel statistics which evolves 
over slower time scales. In (fT9l) . the beamforming direction is dependent on the channel state 
and is based on the feedback information. Despite the adaptation of this direction in response to 
the reverse link feedback, the low-complexity gain associated with beamforming (in the classical 
sense) can be accrued because we still need only a single radio link chain to implement this 
scheme. The need to adapt the beamforming direction at the transmitter at a fast rateQ may 
impose additional constraints on the hardware, but these are expected to be sub-dominant in 
comparison with the performance improvement obtained by utilizing the feedback information. 

To summarize, the main conclusion of this work is: Coding across time is not necessary to 
maximize the average mutual information if the quality of CSI feedback is sufficiently good; The 
same conclusion holds with a low quality of CSI feedback if the objective is relaxed to that of 
maximizing the average received SNR. 



V. Simulation Results 

We now present numerical studies to demonstrate that the rank-one codebook is a reasonable 
choice for most practical scenarios of interest. We study three settings here: 1) a 2 x 2 i.i.d. 
channel, 2) a 4 x 4 i.i.d. channel, and 3) a 4 x 4 correlated channel with variance of channel 
entries given by 

0.1 0.4 
0.1 0.4 
0.4 0.4 
0.4 0.4 

In all the cases, the channel power E[Tr(HHt)] is normalized as NtNr. For all cases, we 

compare the mutual information between the best rank-one and 'best' rank-two codebooks. The 
cases studied are: a) B = 2, Ni = 4, N2 = 1, and h) B = 2, Ni = 2, N2 = 2. We now elaborate 
on how to obtain the best rank-one and rank-two codebooks. 



V4 



16 
Z6 



'The rate has to be slightly faster than the rate at which the channel evolves. 
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From (fT5l) . the structure for each codeword is Q''-' = UjAjUj. Fixing {Uj}, the different 
rank-one codebooks are characterized by different choices of rank-one {Aj}. For example, with 
Nr = Nf = 4:, Ni = A and A^2 = 1, there are four choices of rank-one codebooks: A* = diag(ei) 
where is the i-th standard basis vector of M^. Similarly, there are six possible rank-one choices 
for Nr = Nt = 4:, Ni = 2 and N2 = 2. The best rank-one codebook is the one that maximizes the 
average mutual information. The above procedure is difficult to extend for rank-two codebooks. 
This is because even though there are only finite choices for the positions of the modes that 
can be excited, the power allocations between these two excited modes can run through a 
continuum. For example, with A^,. = A^^ = 4, A^i = 4 and N2 = 1, Ai = diag(l/2, 1/2, 0, 0), 
Ai = diag(l/3, 2/3, 0, 0), or Ai = diag(0, 2/3, 0, 1/3) are all feasible choices for rank-two 
codebooks. This difficulty forces us to study this case by randomly generating 50 different sets 
of {Aj} and picking the 'best' rank-two codebook from this random set. Further, since there is 
no proper distance metric to pack unitary matrices, a random family of {Uj} are generated via 
random vector quantization (RVQ). Numerical studies show that there is roughly very similar 
performance with different choices of {Uj} and hence, only one such choice is highlighted. 

In the simulations, the choice of K used is Nc- This is because while rank-one codebooks 
meeting the GOC exist for up to = 2Nc, the study in Sec. IIII-BI suggests that rank-two 
codebooks that meet the GOC may not exist. We illustrate our results with Gaussian inputs, but 
numerical studies show that input constellation plays a minimal role in the trends. 

Fig. [H plots the mutual information with the best rank-one and rank-two codebooks for A^i = 
4, N2 = 1, and for A^^i = 2, N2 = 2. Benchmark plots of the perfect CSI (upper bound), 
only statistical information (lower bound), and statistical beamforming (lower bound) are also 
presented. A magnified view of this plot in Fig. [21 shows that the best rank-one codebook 
outperforms all other rank-two codebooks. In all subsequent plots, we focus only on a magnified 
view of the comparison between rank-one and rank-two codebooks since all plots show very 
similar trends for the mutual information and the main focus is on our conjecture that a rank- 
one scheme leads to good performance in practice. Fig. |3]and Fig. |4]both verify that a rank-one 
codebook outperforms rank-two codebooks in the 4x4 i.i.d. and 4x4 correlated channels, 
respectively, thus suggesting that in most practical scenarios of interest, beamforming is a good 
candidate for optimal signaling. 
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VI. Conclusion 

In this work, we have studied the cases of coding across space and across space-time in a 
unified fashion by considering a family of linear dispersion codes that satisfy an orthogonality 
constraint. Our results show that there is no need to code across time either when the channel 
information at the transmitter is perfect or when the channel information is of a sufficiently 
good quality. On the other hand, even when the channel information is not of a good quality 
(corresponding to low rates of feedback), the low-complexity beamforming scheme possesses 
some attractive optimality properties, namely, it maximizes the average received SNR. From 
a design viewpoint, beamforming is particularly attractive: The low-complexity of its design 
augmented with the low-cost ensured by using a single radio link chain. 

Note that the orthogonal LD codes are of a complexity comparable to the OSTBC which 
are commonly used in standardization efforts. However, even in the case of rank-one signaling, 
one may be able to send K > 2Nc data-streams with dispersion matrices that do not meet the 
orthogonality constraint. The obvious disadvantage of this strategy is that the data-streams may 
have to be separated at the receiver with more complex decoding architectures. More so, the 
objective of maximizing the average mutual information of the inner space-time code can be 
met by precoding schemes that multiplex more than one data-stream, albeit at the cost of some 
decoding complexity. Thus, it is not clear as to what is the trade-off between mutual information 
and decoding complexity. Furthermore, our work provides a good justification as to why there 
has been significant recent attention on limited feedback precoding/beamforming schemes [2], 
[3], [29]-[33] rather than on limited feedback space-time coding schemes. 

Much work needs to be done to understand how these results translate to more practical 
scenarios of interest where the channel information at the receiver or the statistical information 
at the transmitter may not be perfect, the channel is not block fading, wideband etc. Construction 
of dispersion matrices that satisfy desired low-complexity properties is another area of interest. It 
is also important to note that we have only scratched the surface on understanding the trade-off 
between reliability and throughput with constraints on the complexity of the encoder-decoder pair. 
While reliability is an important design metric in certain situations, throughput (more coarsely 
identified as the 'multiplexing gain') is probably a more important aspect in the design of high 
data-rate wireless systems. In such settings, it is of interest to understand how and when low- 
complexity, adaptive signaling techniques can be leveraged to achieve near-optimal performance. 
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Appendix 

A. Proof of Theorem [7] 

Denote A^A^ by and Yl!k=i Qfc Q' observe that Q is a positive semi-definite 
matrix with trace constrained by With 7 = E [/(X; Y|H = H)]+ KE [h{w)], we have 



7 



(a) 

< E 



K 



k=l 



+ KB[h{n)] = E 



P 



Z^^ h^Tr Q.HTH 



< K -E 



P 



K-E 



y,l|-Tr(QHtH 



where equality holds in (a) if the GOC condition is satisfied and (b) follows from the concavity 
of ip{-). Optimizing over the choice of Q in the above equation results in an upper bound for 
E [/(X; Y|H = H)]. Denote by Qopt the solution to the following optimization problem: 



Q 



opt 



arg 



max 



E 



Tr(Q) = 



■^(|-T.(HQHt)) 



(20) 



where Q = AA^ A choice of dispersion matrices {Afc} that satisfies AfcA|. = Qfc = Qopt for 
all k and that meets the GOC condition would result in an equality in the upper bound and 
hence achieves the ergodic capacity. The fact that Qopt in (l20l) is diagonal follows from [13]. ■ 



B. Proof of Theorem \2\ 

The connection between minimum mean squared error (MMSE) estimation and mutual infor- 
mation established in [34] implies that ^^^^ = | mse(a) where mse(a) is the mean squared error 
for the channel under consideration at an SNR of a. The positivity and the monotonous decrease 
of the mse(-) function implies that (and hence /fc(-)) is concave and non-decreasing. We 
first upper bound Ik and this leads to an upper bound on /(X; Y|H = H). For this, note that 

4 + /i(n)=y;(^£Tr(Q,HtH)^ = (j-J^^\^(Q,H^H)j <V (^|-Ai(HtH)Tr(Qfc)^ (21) 

where (a) follows from the fact that Aj(QfcH^H) < Aj(Qjt)Ai(H^H) and the monotonicity of ^i-). 
The concavity of (p implies that r] = X]^i + Kh(n) satisfies 

'pN, 



^ (^A,(HtH)Tr(Q,)) < ^A,(HtH; 

fe=i ^ * / V * 



K 



Ai(HtH; 



(22) 



We now show that the upper bound is in fact achievable. Consider the maximization of 
Y,k h over the set Q = {Qfe = Q for all A;, Q ^ and Tr(Q) = We then have u = 
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^(^Tr(HQHt)) satisfying 
u = y,(^|-Tr(QHtH) 



Aft 



(a) 



Nt 



V.l'#Ai(HtH) 



where (a) follows from the monotonicity of ip{-) and the fact that if A and B are n x n 
positive semi-definite matrices, then ^"^^Ai(AB) < ^"^^ Aj(A)Ai(B) and (b) follows from 
trivially upper bounding Ai(H^H) with Ai(H^H). Also note that the upper bound is achieved by 
beamforming along the dominant eigen-direction of H^^H. Thus, we have 

/(X;Y|H= H) <Y,h = K^ (EI^ x^{y^\H) \ - Kh{n) (23) 
k=i ^ ^ 

with equality if and only if Q is as above and the GOC conditions are satisfied. ■ 



C. Proof of Theorem |2] 

First, note that any (generic) codebook can be written as 

C = {c^ £ = 1, ■ ■ ■ , 2^} where c' ^ (Q{, ■ ■ ■ , Q^) s.t. Tr(Q^,) < NtN,. 

Further, define a codebook D as 



K 



k=l 



D = {S 



2^} where S = (Q^ ■ ■ ■ , Q') s.t. Tr(QO < 



NtN, 



K 



Denoting the families of codebooks of the type C and D by C and V, respectively, we have 
V dC. With a codebook C from C, the average mutual information is 

P 



fc=i 



(a) 

< Eh 



maxir^ ( ^Tr(HQ^H^; 



where Q* 



X^fcLi Qfc satisfies Tr(Q*) < ^^^^ and (a) follows from the concavity of 
V9(-). Thus, the average mutual information with a codebook C can be upper bounded by an 
appropriately generated codebook from V. Since D C C, the upper bound is tight. ■ 



D. Proof of Lemma [7] 

From the fundamental theorem of calculus and the MMSE connection in App. |Bl we have 



AI 



Eh 



mse{x)dx 
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where A = ^ Em=i c^jmSim, B = ^Amax(H'^H), and mse(-) is the MSE function. Note that 
A < B. Since a Gaussian input maximizes the mse(-) function [34], an upper bound to A/ can 
be achieved by replacing the entropy function ip{x) with that corresponding to a Gaussian input, 
(or equivalently, log(l + x)). Thus, we have 



AI < EH[log(l + 5)-log(l + A)] = Eh 



log 1 + 



- K " 



l + A 



l + A 



^5^. Eh 



K 



Nt 



m=l 



where (a) follows from log-inequality and (b) trivially. 



E. Proof of Theorem |5] 

We need the following proposition towards proving the theorem. 

Proposition 3: For any {aj^k, j = 1, ■ ■ ■ , M} such that J2k=i ^o,k = 1, we have 



N 



N 



N 



max y^aj^kyj,k < 01,^1 ■• • ^m.^m max{?/i,fc,, ■ ■ ■ , ^/m./cm}- (24) 

k=l ki=l kni=l 

Proof: Note that max{?/j, 2} > ?/j and max{?/j, 2;} > z for any set of real numbers 
{yi]i=i^... and z. Thus, we have the inequality 

( N 1 ^ 

max I ^ 2; 1- < ^ A max{?/i, 2;} (25) 



where Eii A = 1- 

We now prove the proposition by induction. With M = 1, equality holds and the statement 
is trivially valid. If the proposition holds for some M , we then have 

N ( N N 

^.__^niax^^^ ^ ctj, kVj, k = max <| ^ aM+i, kM+i Vm+i, ku+i > '^i. ^^i. ^ 



fc=i 



(a) 



k=l 



N 



N 



N 



< ^ Om+i, max <^ yM+i^ ^m+i , ^ Oi, fci ■ ■ ■ aa/, k^ max{?/i, , 

fcM+i=l I ki=l kM=l 

(b) ^ AT N 

< ■■■ ^ a^.fcM X] ^M+i,kM+i max{yM+i,fcM+i,max{?/i,fc^, 



?/M, kM } } 



fci=l 
TV 



Af AT 

max {1/1, fc,, ■■■ , |/M+i,feM+i} 

^1=1 = ! fcAf+i = l 

where (a) follows by applying (|25l) on the first term in the max and the hypothesis in (|24l) on 
the second term, and (b) by applying (|25] ) on the second term in the max. ■ 
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Proof of Theorem ^ To prove the theorem, we upper bound the average SNR at the receiver 



E 



Nt 



< E 



< E 



K 
K 



m=l 

Nt 

ttl,mi 

m\=l 

max < max s,- 



(a) 

< E 



Nt 



max > ajm max Si 

K j ^-^ i 

m=l 



Nt 



0'N2,mM2 i^^x i max 



IXlcLX S^v 



where (a) and (b) follow from Prop. [3l and (c) follows by letting 



l5 



7V2) = arg 



max 



E 



max s 



max I max Sj^^ | 



{{mi , ■■■ , r)ijv2 )• If: ™j 5: -'^t for all j=l, ■■■ , A''2} 

The upper bound can be achieved by letting a*^ = 5m*m vvhich again does not depend on the 
channel realization. That is, the optimal codebook is of rank-one. ■ 
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Fig. 1. Average mutual information of different codebooks in a 2 x 2 i.i.d. channel. 
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Fig. 2. A magnified view of the performance of rank-one and ranlc-two codeboolcs in the 2x2 i.i.d. case. 
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Fig. 3. A magnified view of the performance of rank-one and ranlc-two codeboolcs in the 4x4 i.i.d. case. 
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Fig. 4. A magnified view of the performance of rank-one and ranlc-two codeboolcs in the 4x4 correlated case. 



